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Abstract 

During the past decade, there has been an extensive investigation of the computational complexity of the 
consistent answers of Boolean conjunctive queries under primary key constraints. Much of this investigation 
has focused on self-join-free Boolean conjunctive queries. In this paper, we study the consistent answers of 
Boolean conjunctive queries involving a single binary relation, i.e., we consider arbitrary Boolean conjunctive 
queries on directed graphs. In the presence of a single key constraint, we show that for each such Boolean 
conjunctive query, either the problem of computing its consistent answers is expressible in first-order logic, 
or it is polynomial-time solvable, but not expressible in first-order logic. 

Keywords: Databases, conjunctive queries, database repairs, consistent answers, key constraints. 


1. Introduction 

Database repairs and consistent query answering, introduced in 1 ], provide a principled approach to 
the problem of managing inconsistency in databases and, in particular, to the problem of giving meaningful 
semantics to queries on an inconsistent database. If E is a set of integrity constraints, then an inconsistent 
database w.r.t. E is a database instance I that does not satisfy every constraint in E. A repair of an 
inconsistent database instance I is a database instance J that satisfies every constraint in E and differs from / 
in a “minimal way”. The consistent answers of a query q on / is the intersection fj{g(J) : T is a repair of I}. 
If q is a Boolean query, then computing the consistent answers of q is the following decision problem, denoted 
by certainty^): given a database instance /, is q(J) true on every repair J of J? 

There has been an extensive investigation of the algorithmic properties of consistent query answering 
for different classes of integrity constraints and different types of repairs (see [ 2 ] for a survey). Much of 
the focus of this investigation has been on the consistent answers of conjunctive queries under primary key 
constraints and subset repairs. Let S be a relational database schema such that every relation in S has a 
single key. A subset repair of a database instance I over S is a maximal (under set inclusion) subinstance 
J of I that satisfies every key constraint of S. It is easy to see that, in this scenario, for every Boolean 
conjunctive query q 1 we have that CERTAlNTY(q) is in coNP. It is also known that, depending on the query 
q and the key constraints at hand, the actual computational complexity of certainty^) may vary from 
being coNP-complete to being FO-rewritable, i.e., there is a first-order expressible query q' such that, for 
every database instance /, we have that q is true on every subset repair of I if and only if q' is true on I. 

The preceding state of affairs gave rise to a research program aiming to classify the computational 
complexity of CERTAlNTY(g), where q is a Boolean conjunctive query under primary key constraints and 
subset repairs. After a sequence of partial results by several different researchers mm E® 0 01 nm (see 
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also m for a survey), a breakthrough trichotomy result was recently announced by Koutris and Wijsen. 
Specifically, in 112] . Koutris and Wijsen showed that for every self-join-free Boolean conjunctive query q , 
one of the following three statements holds: (a) certainty^) is coNP-complete; (b) CERTAiNTY(g) is in 
PTIME; (c) CERTAlNTY(g) is FO-rewritable. Moreover, there is an algorithm that, given such a query q , the 
algorithm determines which of these three statements holds for certainty^). 

The hypothesis that the Boolean conjunctive queries considered have no self-joins plays a crucial role 
in the proof of the trichotomy theorem in [T^j. As a matter of fact, essentially all the earlier work on the 
classification of CERTAINTY^) is about self-join-free conjunctive queries, since most of the currently available 
techniques cannot handle the presence of self joins. Two notable exceptions are coNP-hardness results for 
specific Boolean conjunctive queries with self-joins in [3] and a broad sufficient condition for FO-rewritability 
of Boolean conjunctive queries involving a single relation in [7j. 

In this paper, we investigate the algorithmic aspects of CERTAlNTY(g), where q is a Boolean conjunctive 
query over a single binary relation (hence, the query has self-joins, provided it has at least two atoms). 
In other words, we investigate the complexity of computing the consistent answers of arbitrary Boolean 
conjunctive queries on directed graphs. Our main focus is on the case in which there is a single key constraint, 

1. e., we focus on Boolean conjunctive queries over a single binary relation in which one of the attributes 
is a key. We show that if q is such a conjunctive query, then either CERTAlNTY(g) is FO-rewritable, or 
CERTAlNTY(g) is in PTIME, but it is not FO-rewritable. In addition, we characterize when each of these 
two cases occurs. More precisely, we first point out that every Boolean conjunctive query q over a binary 
relation and with one of its attributes as a key is equivalent to either a path query or a collection of disjoint 
cycles. We then show that if q is a path query or the query “there is a self-loop”, then CERTAlNTY(g) is 
FO-rewritable; in contrast, if q is a collection of disjoint cycles each of length at least 2, then certainty^) 
is in PTIME, but it is not FO-rewritable. 

It should be pointed out that Maslowski and Wijsen [13] have established a dichotomy theorem for the 
problem ^CERTAlNTY(g) of counting the number of subset repairs satisfying a Boolean conjunctive query q 
that may contain self-joins: for each such query q , either #CERTAINTy(( 7 ) is in FP (the class of polynomial¬ 
time solvable counting problems), or ^CERTAINTY^) is #P-complete. When this result is applied to the case 
of Boolean conjunctive queries over a single binary relation, then it is not hard to verify that ^CERTAINTY^) 
is in FP only when q is equivalent to one of the queries, “there is a path of length 1”, “there is a path of length 
2”, “there is a self-loop”; for all other queries q 1 it turns out that #CERTAiNTY(g) is #P-complete. Thus, for 
Boolean conjunctive queries q over a single binary relation, the dividing line between FO-rewritability and 
PTIME-computability for certainty^) is substantially different from the dividing line between membership 
in FP and #P-completeness for #CERTAINTy(< 7 ). 

2. Preliminaries 

In general, a relational database schema or, simply, a schema is a finite collection R of relation symbols, 
each with an associated arity. Here, we will consider a schema R consisting of a single binary relation R. 
We will review some of the basic notions of relational database theory for this particular setting. 

A relational database instance over R or, simply, an instance over R is a binary relation, which, for 
notational simplicity, we will also denote by R. A fact is an expression R(a,b), where a and b are values 
such that (a, b) £ R. An instance over R can be thought of as a graph such that there is an edge from a 
node a to a node b if R(a, b) is a fact of R. 

We assume that the relation symbol R has a single key and that, actually, the first attribute of I? is a 
key. A consistent instance or a consistent graph is a binary relation R that satisfies the key constraint, i.e., 
it does not contain two facts of the form R(a,b) and R(a,b') with b ^ b'. An inconsistent instance or an 
inconsistent graph is a binary relation R that violates the key constraint, i.e., R contains two facts R(a,b) 
and R(a , b') with b ^ b'. 

A subset repair or, simply, a repair of an instance I? is a maximal consistent sub-instance of R ; in other 
words, a repair of R is an instance R' C R that satisfies the key constraint and such that there is no instance 
R" with the property that R' C R" C R and R" satisfies the key constraint. 

Let q be a boolean query over the schema R. 
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• CERTAlNTY(g) is the following decision problem: given an instance R, is q true on every repair of R1 

• ^CERTAlNTY(g) is the following counting problem: given an instance R , find the number of repairs of R 

that satisfy q. 

In this paper, we focus on conjunctive queries. By definition, a conjunctive query over a schema R is a 
first-order formula built from atomic formulas of R using conjunction and existential quantification. If R 
consists of a single binary relation R , then every conjunctive query is logically equivalent to an expression 
of the form q( z) = 3w(I?(xi) A ... A I?(x m )), where each x,; is a pair of variables, z and w are tuples of 
variables, and the variables in xi,..., x m appear in exactly one of z and w. A boolean conjunctive query is 
a conjunctive query in which all variables are existentially quantified, i.e., z is the empty tuple. 

The canonical database of a boolean conjunctive query q is the instance D q obtained by viewing each 
variable in the query as a distinct value and each atom as a fact of D q . For example, if q is the boolean 
conjunctive query 3a:, y , z{R(x , y) A R(y , z) A R(z, x), then D q consists of the facts R(x, y), ( y , z), R(z, x). 

Two conjunctive queries q and q' are equivalent if for every instance R, we have that q(R) = q'{R). 
Starting with the work of Chandra and Merlin there has been an extensive study of conjunctive-query 
equivalence and minimization. A conjunctive query q is minimized if there is no other conjunctive query 
q' which is equivalent to q and has fewer atoms in its definition than q has. It is well known that every 
conjunctive query is equivalent to a unique (up to a renaming of the variables) minimized conjunctive query. 
In terms of canonical databases, if we view the canonical database of a boolean conjunctive query q as a 
graph G, then the canonical database of the minimized query q' is the core of the graph G , that is to say, 
a subgraph G' of G such that there is a homomorphism from G to G' , but no homomorphism from G to a 
proper subgraph G" of G (recall that a homomorphism from G to G' is a mapping h from the nodes of G to 
the nodes of G' such that if (it, v ) is an edge of G, then ( h(u ), h(v)) is an edge of G'). 

Two conjunctive queries q and q' are equivalent under the key constraint of the binary relation R if for 
every consistent instance R, we have that q(R) = q'(R). Clearly, if q and q' are equivalent under the key 
constraint of R 1 then CERTAiNTY(g) coincides with CERTAlNTY(g'); similarly, ^CERTAiNTY(g) coincides with 
^CERTAINTY^'). Conjunctive query equivalence under various integrity constraints has been investigated 
in various settings in the past (see, e.g., H~5l fTT . f7I). 

3. Conjunctive-Query Equivalence under a Key Constraint 

We will analyze conjunctive-query equivalence under the key constraint of the binary relation symbol R. 
For example, consider the conjunctive query 3x,y, z(R(x,y) A R(x,z ) A R(y,z)), where the first attribute 
of R is a key. Observe that, if this query evaluates to true on a consistent instance, then the variables y 
and z must be instantiated to the same value. Hence, under the key constraint, this query is equivalent to 
3a:, y, z(R(x , y) A R{x, y) A R(y, y)), which, in turn, is equivalent to 3a:, y(R(x , y) A R(y, y)). We shall show 
that every boolean conjunctive query is equivalent under the key constraint to a boolean conjunctive query 
that has a rather simple form. As a first step, we analyze the structure of consistent instances. 

Proposition 1. Let R be a schema consisting of a single binary relation symbol with the first attribute as 
key. An instance R is consistent if and only if R, when viewed as a graph, is the union of a forest of trees 
oriented from the leaves to the root and of simple cycles whose nodes either are not in the forest or are roots 
of some trees of the forest. 

Proof. The direction from right to left is obvious. For the other direction, suppose that R is a consistent 
instance. Let C be a simple cycle of R. If v is a node on c, then the only outgoing edge from v is the edge 
that goes to the next node on C (otherwise, R is inconsistent). Thus, there are no edges from a node of C to 
some node outside C. Moreover, a directed acyclic graph is a consistent instance if and only if it is a forest 
of trees oriented from the leaves to the root. It follows that R consists of a set of disjoint simple cycles and 
a set of disjoint trees oriented from the leaves to the root, where the root of such a tree may possibly also 
be on one of the cycles. □ 
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Let us return to the boolean conjunctive query 3x, y, z(R{x, y) A R{x, z) A R{y, z)). As seen earlier, this 
query is equivalent under the key constraint to the boolean conjunctive query 3x,y(R(x,y) A R(y,y)). The 
canonical database of the latter consists of the facts R{x,y) and R(y,y), hence its core consists of just the 
fact R{y , y). It follows that 3x, y, z(R(x, y) A R(x, z) A R(y, z)) is equivalent under the key constraint to the 
existence-of-a-self-loop query 3yR(y,y). It turns out that, by first applying repeatedly the key constraint 
and then minimizing, every boolean conjunctive query is equivalent to one that has a simple structure. 

Definition 1. Let R be a schema consisting of a single binary relation symbol. 

• For every n > 2, we write ti-Path to denote the boolean conjunctive query that asserts the existence of 
a path of length n, i.e., n-PATH is of the form. 3x \,..., x n (R(x i, X 2 ) A • • • A R(x n -i,x n )). 

We say that a boolean conjunctive query is a simple path query if it is the n-PATH query, for some n > 2. 

• For every n > 1, we write n-CYCLE to denote the boolean conjunctive query that asserts the existence 
of a simple cycle of length n, i.e., n-CYCLE is of the form 3x ±,..., x n (R(x 1 , £ 2 ) A • • • A R(x n , x\)). 

We say that a boolean conjunctive query is a cycle query if it is the n-CYCLE query, for some n > 1. We 
also say that a boolean conjunctive query is a disjoint collection of simple cycles if it is the conjunction 
of simple cycle queries with no variables in common. 

For example, the query 3x\, ..., x^{R{x\, X 2 ) A R(x 2 , 21 ) A R(x 3 , X 4 ) A R(xi, X 5 ) A R{x$, X 3 )) is the disjoint 
collection of the 2 -Cycle query and the 3 -Cycle query. 

Theorem 1. Let R be a schema consisting of a single binary relation symbol with the first attribute as key. 
Every boolean conjunctive query over R is equivalent under the key constraint either to a path query or to a 
query that is a disjoint collection of cycles such that the length of each cycle in the collection does not divide 
the length of any other cycle in the collection. Moreover, there is a polynomial-time algorithm that, given a 
boolean conjunctive query over R, decides which of these two cases holds. 

Proof. Let q be a boolean conjunctive query over R. First, form the finest partition of the variables of q 
such that if we replace all variables in a single part of the partition with a fresh variable, then the canonical 
database D p of the resulting boolean conjunctive query p is a consistent instance. Intuitively, this is achieved 
by considering all atoms with the same variable, say x , in the first attribute and by replacing all occurrences 
of variables that appear in the second attribute of these atoms with the same fresh variable x /. Clearly, q is 
equivalent under the key constraint to p. Since the canonical database D p of p is a consistent instance, the 
preceding Proposition [T] implies that D p is the union of a forest of trees oriented from the leaves to the root 
and of simple cycles whose nodes either are not in the forest or are roots of some trees of the forest. If D p is 
actually an acyclic graph, then the core of D p is a simple path, hence q is equivalent under the key constraint 
to a path query. If D p contains at least one cycle, then its core is a collection of disjoint cycles such that 
the length of each cycle in the collection does not divide the length of any other cycle in the collection (note 
that every tree can be homomorphically mapped to any cycle). It follows that, in this case, q is equivalent 
under the key constraint to a disjoint collection of cycles such that the length of each cycle in the collection 
does not divide the length of any other cycle in the collection. □ 


4. First-Order Rewritability 

Let q be a boolean conjunctive query over some relational schema S. We say that CERTAiNTY(g) is FO- 
rewritable if there is a boolean first-order query over S such that for every instance / of S, we have that every 
repair of I satisfies q if and only if / satisfies q'. For self-join-free conjunctive queries, a systematic study of 
when CERTAiNTY(g) is FO-rewritable was carried out first by Fuxman and Miller [1] and then by Wijsen [9]. 
In this section, we obtain the following characterization of FO-rewritability of boolean conjunctive queries 
over a schema consisting of a single binary relation with a single key constraint. 

Theorem 2. Let R be a schema consisting of a single binary relation symbol with the first attribute as the 
key. If q is boolean conjunctive query over R, then the following two statements are equivalent. 

• CERTAiNTY(q) is FO -rewritable. 
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• q is equivalent under the key constraint to the 1-Cycle query or to a path query. 

We will first show that if q is the 1 -Cycle query or a path query, then certainty^) is FO-rewritable. 
For the 1 -Cycle query 3xR(x, x), it is easy to verify that the sentence 3x(R(x, x) A \/y(x ^ y —> -iR(x, y))) 
is a first-order rewriting of certainty(1-Cycle). Indeed, if an instance R satisfies the preceding sentence, 
then there is a node a such that the only edge coming out of a is the self-loop I?(a, a). Hence, every repair 
of R must contain the fact R(a,a) 1 which means that every repair of R contains a self-loop. Conversely, if 
every repair of R contains a self-loop, then R must satisfy the sentence 3R(x,x ) A\/y(x ^ y —> ~<R(x,y))), 
since, otherwise, we could construct a repair R! of R that contains no self-loops, since, for every node a such 
that R(a , a) is a fact of R, there is a node b ^ a such that R(a , b) is a fact of R , and we can form the desired 
repair R! by putting such facts R(a 1 V) in it. 

As regards to path queries, note that Fuxman and Miller identified a class, called Cf orest , of self-join- 
free conjunctive queries and showed that if q is a query in C/ ores t, then certainty^) is FO-rewritable. The 
class Cf orest includes as a member every query q ni n > 2, of the form 3x\... x n {Si{x\, X 2 ) A £ 3 ) A • • • A 

S' ra _i(a;„_i, x n )), where the relation symbols Sj are distinct. In general, the first-order rewriting algorithm 
for queries in Cf ores t fails if it is applied to conjunctive queries with self-joins. It can be shown, however, that 
this algorithm produces a correct first-order rewriting when applied to the queries n-PATH, n > 2. Here, we 
give a direct proof of this result. 

Theorem 3. Let R be a schema consisting of a single binary relation symbol with the first attribute as the 
key. If q is a path query , then certainty^) is FO-rewritable. 

Proof. We begin by giving the first-order rewriting of CERTAINTY(2 -Path). Let ip 2 be the first-order sentence 

3x, y, z[R{x, y) A R(y, z) A \/y(R{x, y) -A 3zR(y, 2 ))]. 

We claim that ip 2 is a first-order rewriting of CERTAINTY(2 -Path). Intuitively, ip2 asserts that there is a 
path of length 2 in the database and, moreover, whenever we replace in some repair the first edge of this 
path with another edge whose endpoint is a node u , then there is an edge starting from this node u. This 
ensures that every repair contains a path of length 2. 

More formally, suppose first that an instance R satisfies ip 2 , and that R' is a repair of R. Then there 
are nodes a, b , c such that R(a , b) and R(b, c) are facts of R. It follows that R' must contain a fact of the 
form R(a,b') for some node b' . Since R satisfies ip 2 , there is a node d such that R(b',d) is a fact of R. 
Consequently R' must contain a fact of the form R(b',c "), hence R' contains the path R(a,b'), Rfi^c"). 
Next, assume that R does not satisfy ip 2 - We will show how to construct a repair R' of R that contains 
no path of length 2. If a is a node for which there is a fact R(a , b) of R such that there is no fact of the 
form R(b,c) in R, then we pick one such b and put R(a,b) in R' . Since ip 2 is false on R, if a, b and c are 
three nodes such that R(a,b) and R(b,c ) are facts of R , then there is a b' such that R(a,b r ) is a fact of R } 
but there is no node d such that R(b’ , c') is a fact of R. In this case, we add R(a, b') to the repair R' . We 
continue doing the same for all nodes a that are the beginning of a path of length 2 in R. This construction 
produces a repair R' of R that contains no path of length 2. 

Next, let ip 3 be the first-order sentence 

3x, y, z , w[R(x, y) A R(y, z) A R(z, w) A Vy(R(x, y) -A 

3z, w[R(y, z ) A R(z, w ) A Vw'(R(y, w') —> 3z'R(w', z'))] 

Observe that the sub formula of ip 3 shown in the second row is essentially the formula ip 2 i except that an 
existential quantifier is missing in the front. 

It is not hard to verify that ip 3 is a first-order rewriting of CERTAINTY(3 -Path). The intuition is analogous 
to that for ip 2 1 namely, ip 3 asserts that there is a path of length 3 in the database and, moreover, when we 
replace in some repair the first edge with another edge whose endpoint is a node u, then there is a path of 
length 2 starting from the node u. This ensures that every repair contains a path of length 3. 

A first-order rewriting ip n of CERTAINTY(ti-Path), for n > 3, can be obtained via an inductive definition 
that is similar to the way ip 3 was obtained from ip 2 - Specifically, the first part asserts the existence of a 
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Database Instance Di: 


Database Instance D2: 

AAAAAAAA^AAA AAAAAAA/°^AAA 


Figure 1: Databases Z>i and D 2 


path of length n and the rest of ip n is essentially the first-order rewriting V’n-i of CERTAINTY((tj — 1)-Path), 
except that an existential quantifier is missing in the front. □ 

Next, we focus on queries that are a disjoint collection of cycles each of which has length at least 2. 

Theorem 4. Let R be a schema consisting of a single binary relation symbol with the first attribute as the 
key. Assume that q is a disjoint collections of cycles such that each cycle in the collection has length at least 
2, and the length of each cycle in the collection does not divide the length of any other cycle in the collection. 
Then certainty^) is not FO -rewritable. 

Proof. We first prove the result for the case in which the query is a single cycle of length at least 2; in 
other words, we will show that if n > 2, then CERTAINTy(?x-Cycle) is not FO-rewritable. The proof uses 
the technique of Ehrenfeucht-Fraissee games (see [T8] for an exposition). For concreteness, we provide the 
details for CERTAINTY(2-Cycle) and for CERTAINTY(3-Cycle); the generalization to cycles of bigger length 
will become clear for the constructions in these two cases. 

For the 2-Cycle query, let D\ and D 2 be the database instances depicted in Figure [l] 

• Database Instance Dy. There are two disjoint simple paths of “double” edges, say R(u,v ) and R(v,u), 
each of which forms a 2-cycle. For the first path, there are two simple edges going out from two nodes 
that “far apart” and also “far” from the point the endpoints of the path. For the second path, there are 
two simple edges entering at two nodes on the path that are “far apart”. 

• Database Instance D 2 \ As in D i, there are two disjoint simple paths of “double” edges, say R(u,v) and 
R(v,u), each of which forms a 2-cycle. For the first path, there is one simple edge entering and one 
simple edge going out at nodes that are “far apart”, and also “far” from the endpoints of the path. The 
second path is a copy of the first path of D 2 . 

We claim that every repair of D\ satisfies the 2-Cycle query, while there is a repair of D 2 on which 
the 2-Cycle query is false. To see this, consider first a simple path of “double” edges (with no ingoing 
or outgoing simple edges at some node). In such a path, all nodes have outdegree 2, except for the two 
endpoints of the path. Thus, the edges emanating from the two endpoints must be included in every repair 
of the path. From this, it follows that every repair must contain a cycle of length 2, since, if one tries to build 
a repair that avoids 2-cycles, then one ends up including a cycle of length 2 at one of the two endpoints. 
The situation remains the same if we augment the path with simple ingoing edges. From this, it follows that 
every repair of D 2 contains a 2-cycle in the right component of D 2 . If, however, we augment the path with 
at least one simple outgoing edge, then we can construct a repair of the path that has no 2-cycles. Since 
both components of D 2 have an outgoing simple edge, it follows that D 2 has a repair that has no 2-cycles. 

Let us call a node in Di or in D 2 special if, in addition to the edges of the 2-cycle, it has an ingoing or 
outgoing simple edge. Fix a positive integer m and consider the m-move Ehrenfeucht-Fra'isse game on two 
instances that have the same shape as D\ and D 2 . If the distance between the two special nodes in each 
component is large enough, then it is easy to see that the Duplicator wins the m- move Ehrenfeucht-Fra'isse 
game on these two instances, because when the Spoiler plays close to a special node in one of the instances, 
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then the Duplicator can play on a similar node (i.e., with an outgoing or ingoing extra simple edge) in the 
other instance. Consequently, CERTAINTy(2-Cycle) is not FO-rewritable. 

For the 3-Cycle query, consider a sequence of consecutive 3-cycles R(a, b\), R(bi,d), R(c\, a); i?(ci, b 2 ), 
R(b 2 ,c 2 ), R(c 2 , Ci); R(c 2 l b 3 ), R(b 3 ,c 3 ), i?(c 3 ,c 2 ), and so on, until the last 3-cycle, say, R(c m ,b m+ 1), 
i?(& m +i,a'), R(a',c m ). In this instance, only the nodes ci,c 2 ,... have degree bigger than one, so all other 
edges must be in every repair. Consider a repair of this instance. If it contains R(ci,a) or some edge 
i?(cj,Cj_i), then the repair contains a 3-cycle. The best chance to eliminate all 3-cycles in a repair is to 
remove i?(ci, a), R(c 2 , ci),..., R(c m , c m _i). But then we must keep the edges R(cm,b m +i), R(b m +i,a'), 
R(a! ,c m ), which form a 3-cycle. 

Now, if we have incoming edges to some of the Cj’s, it will still be the case that every repair contains 
a 3-cycle. On the other hand, even a single outgoing edge to some Ci allows to get a repair that has no 
3-cycles by keeping this outgoing edge from Ci and removing the edges R(ci, Cj_i), R(ci, bi+ 1), and ultimately 
removing the edge R(c m , b m+ 1 ), thus eliminating all 3-cycles in the process. 

The rest then is as for the instances with the 2-Cycle query. We form two instances D\ and D - 2 with 
two chains of triangles in each, and ingoing and outgoing edges to some nodes Cj and Cj as in Figure [T[ and 
use Ehrenfeucht-Fraisse games to conclude that CERTAINTY(3-Cycle) is not FO-rewritable. 

Finally, we need to consider disjoint collections of cycles. For concreteness, let q be a query made up of 
three cycles C i, C 2 , C 3 such that the length of each of these cycles does not divide the length of anyone of 
the other cycles (in particular, the length of each cycle is at least 2). Consider the instances D\ and D 2 
for the query associated with the cycle C\ , and form the instances D[ ® C 2 ® C 3 and D ' 2 = D 2 ® C 2 ® C 3 
obtained by forming the disjoint union of D 3 , C 2 , C 3 , and the disjoint union of D 2 , C 2 , C 3 . It it not hard to 
show that every repair of D[ satisfies the query q 1 while there is a repair of D 2 that it does not. Moreover, 
for every positive integer to, we can construct instances that have the shape of D[ and Z3 2 , and are such 
that the Duplicator wins the to- move Ehrenfeucht-Fraisse game played on these instances. Consequently, 
CERTAlNTY(g) is not FO-rewritable. □ 

Theorem [2] now follows from Theorems [l] [3j [4j and the earlier remarks about the 1-Cycle query. 


5. Polynomial-Time Computability 

Let R be a schema consisting of a single binary relation symbol with the first attribute as the key. In this 
section, we show that if q is a boolean conjunctive query over R, then CERTAiNTY(g) is in PTIME. Clearly, 
if certainty^) is FO-rewritable, then certainty^) is in PTIME. Thus, in view of Theorems [l] and [2j it 
suffices to show that CERTAiNTY(g) is in PTIME whenever q is a disjoint collection of cycles such that the 
length of each cycle in the collection is at least 2 and it does not divide the length of any other cycle in the 
collection. This will be accomplished in a series of steps that build to the main result. 

Before we proceed, we need to recall the following facts from graph theory, which will be useful in some 
of the proofs. If G is a graph, then the strongly connected components of G form a partition of the set of 
nodes of G. If each strongly connected component is contracted to a single node, the resulting graph is a 
directed acyclic graph, called the condensation graph of G. The strongly connected components that are 
contracted into sink nodes in the condensation graph of G are called sink strongly connected components. 

We start with proving that CERTAINTY(tt-Cycle) , n > 2, is in PTIME. For this, we need some lemmas in 
which we always assume that we have a schema consisting of a single binary relation with the first attribute 
as a key. 

Lemma 1. If D is a repair of an instance R, then every cycle in D is simple. 

Proof. Assume that D is a repair of R such that D contains a cycle C that is not simple. This means that C 
contains a node u with outgoing edges to two distinct nodes Vi and v 2 . Consequently, D contains the facts 
R(u,v 1 ) and R(u,v 2 ), hence D violates the key constraint, which is a contradiction. D 

Lemma 2. If D is a repair of an instance R and if S is a sink strongly connected component of R, then the 
intersection D D S contains a simple cycle. 
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Proof. Let u be a node in D n S. Since S is a sink strongly connected component of R, every edge outgoing 
from u must lead to a node in S. Therefore, there is a node v such that R(u, v ) is a fact of D D S. By the 
same reasoning, there a node w such that R(v, w) is in D n S, and so on; thus, we obtain a path in which 
every edge is in DOS. Since D D S is finite, at some point we will encounter for the first time a node that 
has been earlier in the path, hence D D S contains a simple cycle. □ 

Lemma 3. If R is an instance, S is a strongly connected component of R , and C is a simple cycle in S, 
then there is a repair D of R such that D n S contains the simple cycle C and no other simple cycle. 

Proof. First, we build a repair D' that contains the simple cycle C. For this, we include the simple cycle 
C in the repair and then we keep adding edges until no more edges can be added while, at the same time, 
satisfying the key constraint of R. 

Second, we build a repair D such that D D S contains the simple cycle C and no other simple cycle. For 
this, we start with the repair Di in the previous step for which we have that Di C\S contains the simple cycle 
C. Suppose that D\ n S contains another simple cycle C'. Note that C and C' have no nodes in common, 
since, by Lemma[lJ every cycle in D\ is simple. We construct another repair D -2 by “breaking” C' as follows. 
The strongly connected component S contains a node u on C' such that there is a path p from u to a node 
of C. We delete the outgoing edge from u that belongs to C' and add the edge e = (u, v ) of u that is on 
the path p. If the newly added edge e has as endpoint a node on C, then it does not create a new cycle. 
Otherwise, it may create a new cycle C ; however, there is now a path from v to a node of C that is shorter 
than p (in fact, this path is obtained from p by deleting the edge e); this way, we continue breaking the next 
cycle (if there is one) until no cycles other than C are left. □ 

Lemma 4. Every instance R has a repair D in which the only simple cycles are in sink strongly connected 
components. 

Proof. Towards building the desired repair D , we start by building a repair D' that has a simple cycle in 
each non-sink strongly connected component. Actually, for every strongly connected component S , we can 
choose any cycle we want in D' n S (as per Lemma [3]), so we choose a cycle that has a special node u such 
that there is an edge from a to a strongly connected component at the next level of the condensation graph 
of R (recall that the condensation graph is a directed acyclic graph). We now build D from D' by “breaking” 
cycles as follows. For every strongly connected component S , we take the special node u in D' n S , and we 
remove the edge outgoing from u in the cycle and add the edge that goes from u to the next level. This new 
edge does not create a cycle because inter-level edges do not belong to any cycle. □ 

We are now ready to state and prove the main technical result of this section. 

Theorem 5. Consider the n-CYCLE query , where n > 2. Then the following statements are true. 

• Every repair of an instance R satisfies the tt-Cycle query if and only if there is a sink strongly component 
S of R such that every simple cycle of S is a homomorphic image of an n-cycle. 

• CERTAiNTY(n-CYCLE) is in PTIME. 

Proof. For the first part of the theorem, recall that, by Lemma |T] the only images of an n-cycle that are 
consistent instances are simple cycles. For the “if” direction, suppose that there is a sink strongly connected 
component such that every simple cycle is a homomorphic image of an n-cycle. By Lemma [2} every repair 
contains a cycle from each sink strongly connected component, hence every repair satisfies the n-CYCLE 
query. For the “only if’ direction, if there is no sink strongly connected component with the property that 
all its simple cycles are homomorphic images of an n-cycle, then, by Lemmas [3] and |4j we can build a repair 
D that contains no homomorphic image of an n-cycle, hence D does not satisfy the n-CYCLE query. 

For the second part of the theorem, we need to prove that the property in the first part of the theorem 
can be checked in polynomial time. Consider the following algorithm: 

For each sink strongly connected component, do: 

1. To check that there is no simple cycle with more than n nodes: 


(a) Examine each n-tuple of nodes (cii, ...,a n ) and test whether they form a simple path. 

(b) If they do, check whether there is a path from a n to a\ that does not contain any of the other nodes 

02, ..., Ofc_i. 

Since n is a fixed number, this last check can be done in polynomial time and, in fact, even in Datalog 
with inequalities 7 C If such a disjoint path exists, then there is a simple cycle with more than n nodes 
through (ai, ...,a n ). Otherwise, there is no simple cycle with more than n nodes through (ai, ...,a n ). 

2. For cycles with fewer than n nodes, we check exhaustively in polynomial time all combinations of k < n 
nodes to see whether they form a cycle which is not a homomorphic image of n-cycle. 

Clearly, this algorithm runs in time bounded by a polynomial in the size of R (the degree of the polynomial 
depends on n , which is a fixed number). □ 

Finally, we consider queries that are a disjoint collection of simple cycles. 

Theorem 6. Let q be a boolean conjunctive query that is a disjoint collection of cycles Ci,... C m such that 
the length of each cycle in the collection is at least 2 and it does not divide the length of any other cycle in 
the collection. Then the following statements are true. 

• Every repair of an instance R satisfies the query q if and only if for every cycle Ci, 1 < i < m, in q, 
there exists a sink strongly connected component of R such that all its simple cycles are homomorphic 
images of Ci. 

• CERTAlNTY(q) is in PTIME. 

Proof. For the “if” direction of the first part of the theorem, since every repair D of R contains a cycle from 
each sink strongly connected component, we have that D contains a homomorphic image of the cycle Ci, for 
each i <m. For the “only if’ direction of the first part of the theorem, suppose that there is a cycle Ci such 
that no sink strongly connected component contains a homomorphic image of it. Lemmas [3] and [4] yields a 
repair that does not satisfy the query q. 

For the second part of the theorem, we need to check that the condition in the first part of the theorem 
can be checked in polynomial time. This is an easy consequence of the second part of Theorem [5] □ 

We conclude the paper by combining Theorems Hi and [ 6 ] into a single result. 

Theorem 7. Let R be a relational schema consisting of a single binary relation R in which the first attribute 
is the key. If q is a boolean conjunctive query over R, then CERTAlNTY(g) is in PTIME. Moreover, exactly 
one of the following two possibilities hold: 

• The query q is FO -rewritable, and it is equivalent under the key constraint to the 1-Cycle query or to a 
path query. 

• The query q is not FO -rewritable, and it is equivalent under the key constraint to a disjoint collection of 
cycles such that the length of each cycle is at least 2 and it does not divide the length of any other cycle 
in the collection. 
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